At 8:39 PM -0400 9/16/04, Richard Connamacher wrote:
>I'm new to PostgreSQL, and from the looks of it, it's a great database,
>and I'll be using more of it in the future.
>
>I had a quick question if anyone could clear this up. The documentation
>for PostgreSQL (version 7.1, the version this server is using) says that
>it supports multibyte character encodings like Unicode (which implies
>UTF-16 encoding).
Don't confuse Unicode, the 'character set' and rules for characters,
represented by a sequence of abstract 32 bit integers, with
UTF-[8|16|32] which is a way to encode those abstract integers into a
stream of bytes someplace.
> Later on, the same page says that Unicode is
>represented using UTF-8 encoding. UTF-8 is the 8-bit version of Unicode.
>The multibyte version of Unicode is UTF-16.
>
>So, which is it? If I create a database using Unicode as the encoding,
>will the encoding be UTF-8 (singlebyte) or UTF-16 (multibyte)?
Erm... UTF-8 *is* a multibyte encoding. Up to 6 bytes per code point,
if things get really degenerate. (And, last I checked, means you can
have up to 70 bytes for really degenerate characters, but my memory
might be off (could be 80))
UTF-8, UTF-16, and UTF-32 will all encode Unicode characters just fine.
--
Dan
--------------------------------------it's like this-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk